29. Parentheses, Catalan Numbers and Ruin 

1. Introduction 

A sequence of zeroes and ones can represent a message, a sequence of 
data in a computer or in digital communications, but it also can represent all 
sorts of mathematical or real world objects. 

Thus a sequence of length n can represent the outcome of n coin 
tosses, or a binary valued vector of lengtli n, or a polynomial of degree 
up to n-1 with binary coefficients, or a number. 

We will now discuss three other interpretations for such a sequence. 

First, a sequence of length n can represent a subset of an n-element 

set, S. 

Each position, j, in the sequence can represent one of the n 
elements. 

A given subset, T, of S can then be represented by the n-length 
sequence having I's in the positions corresponding to the elements of T 
and O's elsewhere. 

For example, for n=6, the subset {2,3,5} can be represented as 
011010. 

The 2" binary sequences of length n then correspond to the 2" subsets 
of an n-element set. 

We already know something about subsets of a set. Thus the number 
of such subsets that have k elements in them is the binomial coefficient 
C(n,k) (which is n(k)/k!). We will deduce some more properties of collections 
of subsets from the next interpretation. 

2. Parentheses 

A another curious interpretation of binary sequences is that obtained 
by replacing each 1 by a left parenthesis and each by a right one. 
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Thus 01 1010 can be written as )(( )( ). 



The gain we get from this silly looking substitution is that there is a 
natural definition of "closing" among parentheses. If a left and right 
parenthesis are adjacent in the proper order, like ( ), we can "close" 
them with one another, and ignore them in future closings so that those 
around them become adjacent (as happens similarly with the pairings in 
the Hu Tucker algorithm). 

Thus in our example above, )(( )( ), the first two parentheses remain 
open while the last two close. 

A famous ancient question in this context is: how many distinct 
arrangements of n pairs of left-right parentheses are there all of which 
close? 

The answer to this question is called the n-th Catalan number, C(n). 
Here are the first few answers: 



We can deduce the answer to this and many similar questions in terms 
of binomial coefficients by noticing that every sequence of 2n 
parentheses, or in other terms, every subset of a 2n-element set, has 
some open part and some closed part when considered as a sequence of 
parentheses. 

This allows us to define a natural partition of all subsets of a 2n- 
element set, (or of an odd sized set as well.) 

Each block of the partition consists of all subsets which have the 
same closed part in the same locations. 

Thus, in our original example, )(( )( ), the closed part consists of the 
last 4 parentheses, and the first two represent its open part. 



C(l)=l 
C(2)=2 
C(3)=5 



() 

00 and (0) 

000, 0(0), (0)0, (00) and ((())) 
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The magical fact that makes this information extraordinarily useful, is 
that open parts are extremely simple. No open part can contain a left 
parenthesis, (, followed by a right parenthesis, ). 

Thus, open parts, wherever they occur in a sequence, all either 
consist of all parentheses of the same kind, or some right ones followed 
by some left ones. 

If, for example, the open part of a sequence has four elements, the 
only possibilities are ((((, )(((, ))(( , )((( and )))), wherever they are located. 

The binary sequences that correspond to these possibilities are 1111, 
0111,0011,0001 and 0000. 

The sets corresponding to them are: none of them, the last element, 
the last two elements, the last three elements, and all four elements. 

Here and in general these sets form what we can call a full chain. 

Each set in a full chain contains or is contained in all the others, 
and there are no gaps among them: every set in the middle of one has 
another set in the chain with one more element and another with one 
less element. 

What other sequences have the same closed part as our example 

)(()()? 

We can replace the open part by any other open part of the same 
size and location: here we can replace )( at the beginning, by )) and also by 
((. 

The results here, expressed as binary sequences, are that 001010, 
01 1010 and 1 1 1010 all have the same closed part and hence lie in the 
same block in this partition. Considered as subsets these are {3,5}, {2,3,5} 
and {1,2,3,5}. 

The subsets of a set, S, that have the same closed parts all have 
open parts in the same locations, and therefore inherit the wonderful 
properties of open parts. 
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Recall that the set S corresponding to n pairs of parentheses must have 
2n elements all together, one for each parenthesis. 

We will quickly be able to read off what the Catalan number C(n) is 
from these properties. 

Once again, these properties are 

1. Each block in this partition is a full chain, and is completely 
ordered by inclusion. 

2. Each block is symmetric about the middle size (ISI/2). If, as 
here, S has cardinality 2n, their sizes are symmetric about n. This 
statement follows from the fact that the closed parts all have the same 
number of left parentheses as right ones and so the corresponding sets 
have one element for each closed parenthesis. The open parts are also 
symmetric about the middle size. 

3. Full chains have no gaps in the them. They have an element of 
every size from some minimum, 2n/2 - k to 2n/2 + k. 

In short, each block has some smallest set in it, and then consists of 
what you get by adding the elements in the open part one at a time 
(highest index first) to that set until all are added. 

We can draw the following conclusions from these facts: 

1. Every set of size 2n/2+j is in one full chain of this kind that 
stretches from at most size 2n/2-j to at least 2n/2+j. 

2. Any such chain contains at least 2j+l member subsets. 

3. Every one of these chains of length at least 2j+l contains a set of 
size n +j. 

We may deduce from the first and third fact that the number of 
chains of length 2j+l or more is the number of subsets of S of size n+j. 

This latter is the binomial coefficient C(2n, n+j). 
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We can also answer the question: 



How many of our chains in this partition have length exactly 

2j+l? 

These are chains of length at least 2j+l but no longer . 

They must all have a member having n+j elements, but no member 
having n+j+1 elements. 

The number of these must therefore be the difference in binomial 
coefficients: 

C(2n, 2n/2 +j) - C(2n,2n/2+j+l). 

(The first term counts how many of our chains have a member of size m+j 
while the second tells us how many of these have larger elements as well 
and so are longer.) 

Now we have found the Catalan number and much more! 

For, parentheses that close completely, which the Catalan numbers 
count count, are exactly those that have no open part and therefore lie in 
chains having exactly one member. This is the j=0 answer here, which 
is: 

C(n) = C(2n,n) - C(2n,n+1). 

3. Gambling Sequences 

It turns out that the open and closed parts of parentheses have 
meaning in terms of gambling sequences. 

For, if we make a left parenthesis correspond to a win, and a right one 
to a loss, in every closed part and in every portion thereof, there are always 
at least as many wins as losses counting from the left-hand beginning to any 
point. 

In other words, you never fall behind from being in a closed part. 
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In fact the first time you fall behind is at the first ) of the open part of 
your sequence, and the maximum you are ever behind is the number of left 
parentheses with which your open part begins. 

Suppose you gamble by betting on the outcome of a series of coin 
tosses or roulette wheel spins, and you bet the same amount, on each toss or 
turn of the wheel, and either win or lose one unit per event. 

Suppose further that you start with a stake of X units, and can stay for 
N events unless you lose all your money. 

You will be wiped out in any sequence in which you ever lose X 
events more than you have won. 

This means that among the gambling sequences corresponding to sets 
in one of our full chains, if the chain has length X+q then you will be wiped 
out in exactly q of its sequences. ( You will not be wiped out in those 
sequences having up to X-1 starting left parentheses and will be wiped out 
otherwise.) 

Thus you get at most X sequences in which you survive to play N 
events from each chain (and fewer if the chain has fewer members than X). 

But the number of sequences obeying this condition, at most X from 
any chain, is exactly the number of sets of the X middle sizes. These also 
have X members in each chain of length at least X and contain all elements 
of all shorter chains. 

The probability of not being wiped out after N events, when odds 
of winning are even, is therefore the sum of the X middle (i.e, largest) 
binomial coefficients divided by 2^. 

We can also ask: 

What is the probability of winning k times more than you lose given 
that you must drop out and lose your stake if you ever fall X behind? 
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There are C(N,k/2+N/2) gambling sequences of length N for which 
you end up winning k more times than you lose. But we must throw out 
those that start with X or more right parentheses in their open parts. (In them 
you are wiped out.) 

Suppose an open part has Q parentheses in it. If you win k more times 
than you lose, there must be Q/2+k/2 left parentheses and Q/2-k/2 right 
parentheses in the open part, since the closed part balances between wins 
and losses. You are wiped out if the latter number is at least X since these 
will be sequences in which you fall Q/2-k/2 behind before you start winning. 

Thus the number of win k non wipe out sequences is C(N,k/2+N/2) 
less the number of chains of length Q+1 with Q=X+k/2 or more, (each of 
which will contain a wipeout "net k win" sequence). 

We have counted such chains before and the latter number is 
C(N,N/2+X+k/2), so that the number of non-wipeout k win sequences is: 



C(N,(k+N)/2) - C(N,(k+N)/2+X) 



unless of course k is -X or less in which case you are definitely wiped out, 
while the expression here can become negative. 

This information allows you to compute your probability of winning 
whatever given an initial stake with any constant probability of winning, p, 
by summing the number above multiplied by kp*"^^^^^(l-p)'^^"'^^^^ over k 
above -X and subtracting X p^''^^^^\l-p)^^"''^^2C(N,(k+N)/2-X) in this range 
and Xp^''^'^^^\l-p)^'^"''^'^C(N,(k+N)/2) for k at most -X, to account for your 
losses when you are wiped out. 

Exercise: Use a spreadsheet to evaluate this for p=.49, N=100, and X =5 
and 10. Do the same for p=.51. What do you find? 



4. AppHcations to Extremal Set Theory 

The partition of the subsets into blocks defined by closed parts has 
other implications that are worth noting, in the area of "extremal set theory." 
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In particular we will deduce an upper bound for the size of an 
antichain of subsets (defined below) from it, as well as an upper bound to the 
number of antichains of subsets. 

We define an antichain of subsets of a set S to be a collection of its 
subsets having the property that no two of its members are ordered by 
inclusion. 

In short no member contains another. 

We ask then: 

How many members can an antichain have? 

The answer to this question, (which is called "Sperner's Theorem"), is 
the largest binomial coefficient C(N, [N/2]) when we are considering 
subsets of S and S has N elements. 

This answer follows immediately from the following two facts. 

1 . Since the chain-blocks of our partition are completely ordered by 
inclusion, an antichain can have at most one member from each block. 

2. Since each of our chain blocks contains a member of size [N/2], the 
size of an antichain, (which is at most the number of blocks as follows from 
the previous observation), is at most the number of [N/2] size subsets of S, 
which is C(N,[N/2]). 

Since the sets of size [N/2] form an antichain of this size, this bound 
cannot be improved. 

Another curious question (first posed by Dedekind in the 19th 
Century) is: 

How many antichains of subsets of an N element set are there? 

There are 2*^^'^'^'^^^^^ subsets of the antichain consisting of sets all of 
size [N/2] and all these are antichains., so this is a lower bound. 
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Hansel found an upper bound of 3 ^ ' for this number by the 
following argument: 

1. An antichain A of subsets of S corresponds to a function, f(A) 
defined on all subsets of S which is 1 on every member of the antichain and 
any set containing such a member, and on sets not containing any member. 

(Such a function is called a monotone Boolean function, monotone 
because when it is 1 on some set q and r contains q it is 1 on r as well, and 
Boolean because its values are and 1) 

2. We can define all such monotone Boolean functions by defining 
them on each chain, and if we do so starting on the short chains first, we 
have at most 3 possibilities on each chain. 

Why is this so? a monotone Boolean function can have at most k+1 
values on a chain of length k: 

Thus, if some member gets value one, everything above it must get 
value 1, and if it gets everything below it must get 0. It is completely 
defined by specifying the smallest 1 valued member if any. 

Thus the possible monotone Boolean functions on a 3 element chain 
can be described as 000, 001, 01 1 and 1 1 1. 

But if we start defining a function on short chains, the values on them 
have implication for the larger chains as well. 

In particular, suppose we look at three successive members of a 
large chain; these will have the form )), )( and (( In 2 successive places In 
their open parts, and be otherwise Identical. 

This means that the set z with ( ) in these same places in their open 
part and otherwise identical lies in a shorter chain (it has a bigger closed 
part) and hence will already be defined when it comes to defining our 
function on this chain. 

But if z has been assigned value then the set with corresponding 
parenthesese )) in these two places in our chaine must get value as well 
since it is smaller than z and z has value 0. 
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Similarly if z has been assigned value 1 then the set with 
corresponding parentheses (( in these places must get the value 1 as well, 
since it is bigger than z. 

This implies that when it comes to define a monotone Boolean 
function on each our chains, at least one of every three members of each 
chain must be predetermined by their values on shorter chains, so there can 
be at most 2 members whose definition is still open. 

This means there are only at most 3 possible definitions per chain 
(00,01 or 11 on these members), which means we can define all possible 
monotone Boolean functions with at most 3C(^'[^^2]^ possibilities all 
together. 

(By careful argument this upper bound can be reduced to something 
whose logarithm is asymptotic to the lower bound.) 

Exercise: How many members can a collection of subsets of an N 
element set S have If no three members, A B and C have A contains B 
contains C? 

5. Catalan Numbers in other Contexts 

Catalan numbers appear in many other places in mathematics. In 
particular, suppose you take a polygon and draw it in the plane so that its 
interior is convex, (the line between any two interior points is entirely in the 
interior) 

We can ask: how many ways are there to introduce chords to your 
polygon so that all interior faces are triangles? 

The answer is a Catalan number. 

Exercise: For a polygon having N vertices , which Catalan number is it? 

We found above a way to evaluate Catalan numbers that had a 
certain generalization, based upon lots of talk and little calculation. 
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There is another way to evaluate Catalan numbers that gives an 
equally simple formula equivalent to the previous answer of course, but 
different and with an entirely different generalization, that is also talk with 
practically no calculation. 

And here it is. 

Suppose we take a good parenthesization, one that closes 
completely, with n parentheses, and add to it at the end another extra right 
parenthesis. 

You obviously get 2n+l parentheses, with one more right one than 
left ones, and these correspond to subsets of size n of a 2n+l element set . 

We can show, by appropriate talk, that if you take the 01 sequence 
that represents this set and take all its cyclical permutations, and do this for 
all good n-parenthesizations, you get sequences that correspond to all n 
element sets that are chosen from among 2n+l elements, and get each set 
exactly once. 

This means that each good n parenthesization corresponds to one 
unique cycle of n element subsets of a 2n+l element set. 

Since there are 2n+l cyclic permutations of a sequence of length 
2n+l, we find: C(n) = C(2n+l,n)/(2n+l). 

Exercise: Show that this formula is the same as the previous one. 

Suppose we represent our parentheses and sets by sequences not ot 1 
and but instead of 1 and -1 . Then a good parenthesization has all the partial 
sums of its sequences non-negative, since every -1 or right parenthesis must 
have its partner left parenthesis or +1 to its left. 

Similarly the partial sums from the right are non positive. 

If we start with a good parenthesization and add a right parenthesis - 
here a -1 on its right, every partial sum but the whole will still be non- 
negative. 
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On the other hand if we make any cycUc permutation, then the 
resulting sequence will begin with a partial sum of the original from the 
right, then the -1 then the rest. When you hit the -1 the partial sum will 
definitely be negative. 

This means that the cyclic permutations of the sequence obtained 
from one good parenthesization, cannot be obtained from any other good 
parenthesization. 

On the other hand, if you take partial sums of any 1-1 sequence of 
length 2n+l with 2n ones, they will have some minimum value. If you start 
right after the corresponding first minimum point, all partial sums up to the 
last one will be non-negative, so that removing the -1 at that point and 
starting beyond it will give a good n parethesization. 

In short, there is a one to one correspondence between good n 
parenthesizations and cyclic permutations of n element subsets of a 2n+l 
element set and that is the proof of our formula. 

Now suppose we consider a different kind of parenthesization, 
where one left parenthesis closes with 2 (or more generally with k) right 
parentheses. 

Here are examples for n=2 ( ()))), ( ) ( )) ), ( )) ( )) 
in terms of 01 sequences thev look like 1 100 00, 10 100 0, 100 100 

We can count these by the identical argument previously used. Take 
a good parenthesization here, add an extra right parenthesis at the right end, 
and argue that cyclic permutations of these give all sets of size n, now out of 
3n+l elements. 

The answer is therefore C(3n+l,n)/(3n+l) for the number of these 

objects. 

Exercises: What is the general formula if each left parenthesis closes 
with k right ones in the same way as considered here for k=2? 
Find the answers for k=2 and n=3 and verify the correctness of the 
formula above. 
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There is a generalization of the Catalan numbers as follows. Suppose each 
left hand parenthesis closes with two right hand ones, (or with k of them but 
we stick with 2 for the moment.) How many arrangements of n such 
parentheses are there? 

For n=l there is 1, for n=2 you can have (()))! or Q_Q)1 or 0)0). 

To be a legal parenthesisation, there must be some adjacent triple of 
the form ()) and you can close it, and remove it, and the same must be true 
on what remains. 
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